This project's goal is to analyze which free/open hotspots and road sections are located. As a result, we are able to pinpoint the areas and routes with the best internet infrastructure for testing new services. Koln will serve as my use case. It will be simpler to choose the locations for new services, such as speed monitoring, traffic light smart atomation, or any IOT application related to smart cities, if you have access to the road sections with the appropriate internet infrastructure. This site will make it feasible to test new technologies without having to worry about building the necessary infrastructure.
import pandas as pd # For handling tables
import re # For data preparation
import haversine as hs # For distance calculations
from haversine import Unit # For distance calculations
import plotly.io as pio # For poltting corrdinates over maps
import plotly.express as px # For poltting corrdinates over maps
import copy
Our Dataset consists of two tables : | Table name | Short discription | | -------- | ------- | | Hotspots in Köln | List of urban hotspots in Köln. In addition to some information regarding this hotspots like the geometrical location , the house number and street name. | | Road sections Köln | The Köln street directory provides an overview of all applicable street names , addresses , geometry paths and house numbers in each road section |
hotspots_in_koeln_df = pd.read_sql_table('hotspots_in_koeln', 'sqlite:///../data/project.sqlite')
road_sections_df = pd.read_sql_table('road_sections','sqlite:///../data/project.sqlite')
To get an overview of the Dataset, i will visualize an overview of our tables. This overview contain the most important infromation in each table
pio.renderers.default = "notebook"
fig = px.scatter_mapbox(hotspots_in_koeln_df,
lat="Geometry_y",
lon="Geometry_x",
hover_name= "Haus Nr",
hover_data=["Straßenname", "Haus Nr"],
color="Straßenname",
zoom=11,
height=1400,
width=1200)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
road_vis_df = copy.deepcopy(road_sections_df)
road_vis_df['Gemotry_path_list'] = road_vis_df['Geometry_paths'].map(lambda x: [list(map(float , element.split(','))) for element in re.findall(r"[.^[]*\[([^]]*)\]", x)] )
road_vis_df = road_vis_df.explode('Gemotry_path_list')
road_vis_df[['Geometry_x','Geometry_y']] = pd.DataFrame(road_vis_df['Gemotry_path_list'].to_list())
pio.renderers.default = "notebook"
fig = px.scatter_mapbox(road_vis_df,
lat="Geometry_y",
lon="Geometry_x",
hover_name= "Straße",
color="Straße",
zoom=11,
height=1400,
width=1200)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
We need to prepare the data before we proceed with our solutions to the data science challenge.
In our database, the street names serve as the primary key. Unfortunately, the tables don't have standardized street names; for instance, they may finish in XYZ str, XYZ strasse, or other variations. Due to this, merging the two tables and processing the data to find a solution is challenging.
So , the steps of data preparation :
# Making standard street names to match with the street names from the roads table
hotspots_in_koeln_df['Straße_name'] = hotspots_in_koeln_df['Straßenname'].map(lambda x: re.split('[s|S]tr[ |a|"."]', x)[0])
# Making standard street names to match with the street names from the hotspot table
road_sections_df['Straße_name'] = road_sections_df['Straße'].map(lambda x: re.split('[s|S]tr[ |a|"."]', x)[0])
# Merging the hotspots and the road section tables
all_in_one = hotspots_in_koeln_df.merge(road_sections_df , on='Straße_name')
all_in_one['houses in the intersection'] = all_in_one[['Hausnummer von links' , 'Hausnummer bis links' ,'Hausnummer von rechts' , 'Hausnummer bis rechts' ]].values.tolist()
We filter the road section that has house number that directly have a hotspot
interesections_with_hotspots = []
for i , row in all_in_one.iterrows():
try :
# Check if the house number are already one of the houses in the intersection
if row['Haus Nr'].zfill(4) in row['houses in the intersection']:
interesections_with_hotspots.append(row)
except:
pass
interesections_with_hotspots_df = pd.DataFrame(interesections_with_hotspots)
# Combine the geometry coordinates of the hotspot in one coloum
all_in_one['hotspot_geometry'] = all_in_one.apply(lambda x: list([x['Geometry_x'],x['Geometry_y']]) ,axis=1)
# make the geomtry paths list of floats instead of string
all_in_one['Gemotry_path_list'] = all_in_one['Geometry_paths'].map(lambda x: [list(map(float , element.split(','))) for element in re.findall(r"[.^[]*\[([^]]*)\]", x)] )
interesections_with_range_of_hotspots = []
for i , row in all_in_one.iterrows():
loc1 = row['hotspot_geometry']
for loc2 in row['Gemotry_path_list']:
# check if the one of the intersection paths is in the range of the hotspot
if hs.haversine(loc1 , loc2,unit=Unit.METERS) < 45:
interesections_with_range_of_hotspots.append(row)
interesections_with_range_of_hotspots_df = pd.DataFrame(interesections_with_range_of_hotspots)
pio.renderers.default = "notebook"
fig = px.scatter_mapbox(interesections_with_hotspots_df,
lat="Geometry_y",
lon="Geometry_x",
hover_name= "Haus Nr",
hover_data=["Straßenname", "Haus Nr"],
color="Straßenname",
zoom=11,
height=1400,
width=1200 )
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
pio.renderers.default = "notebook"
fig = px.scatter_mapbox(interesections_with_range_of_hotspots_df,
lat="Geometry_y",
lon="Geometry_x",
hover_name= "Haus Nr",
hover_data=["Straßenname", "Haus Nr"],
color="Straßenname",
zoom=11,
height=1400,
width=1200 )
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
We are not running the data pipeline here becasue our data has been updated , but we use a local version which i have downloaded beforehand.